skip to main content
10.1145/3519939.3523442acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections

Autoscheduling for sparse tensor algebra with an asymptotic cost model

Published:09 June 2022Publication History

ABSTRACT

While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse tensors also introduce a choice of compressed storage formats that can have asymptotic effects. Research into sparse tensor compilers has led to simplified languages that express these tradeoffs, but the user is expected to provide a schedule that makes the decisions. This is challenging because schedulers must anticipate the interaction between sparse formats, loop structure, potential sparsity patterns, and the compiler itself. Automating this decision making process stands to finally make sparse tensor compilers accessible to end users.

We present, to the best of our knowledge, the first automatic asymptotic scheduler for sparse tensor programs. We provide an approach to abstractly represent the asymptotic cost of schedules and to choose between them. We narrow down the search space to a manageably small Pareto frontier of asymptotically non-dominating kernels. We test our approach by compiling these kernels with the TACO sparse tensor compiler and comparing them with those generated with the default TACO schedules. Our results show that our approach reduces the scheduling space by orders of magnitude and that the generated kernels perform asymptotically better than those generated using the default schedules.

References

  1. Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. 2019. Learning to optimize halide with tree search and random programs. ACM Trans. Graph., 38, 4 (2019), July, issn:0730-0301 https://doi.org/10.1145/3306346.3322967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, and Jonathan Ragan-Kelley. 2021. Efficient automatic scheduling of imaging and vision pipelines for the GPU. Proc. ACM Program. Lang., 5, OOPSLA (2021), Oct., https://doi.org/10.1145/3485486 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U. O’Reilly, and S. Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques (PACT). isbn:978-1-4503-2809-8 https://doi.org/10.1145/2628071.2628092 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gilad Arnold. 2011. Data-Parallel Language for Correct and Efficient Sparse Matrix Codes. University of California, Berkeley.Google ScholarGoogle Scholar
  6. Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert Harrison, Sriram Krishnamoorthy, Sandhya Krishnan, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics, 104, 2 (2006), Jan., issn:0026-8976 https://doi.org/10.1080/00268970500275780 Google ScholarGoogle ScholarCross RefCross Ref
  7. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: a polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. isbn:978-1-72811-436-1Google ScholarGoogle ScholarCross RefCross Ref
  8. Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in High-Performance Computing Applications. Proc. IEEE, 106, 11 (2018), Nov., issn:0018-9219, 1558-2256 https://doi.org/10.1109/JPROC.2018.2841200 Google ScholarGoogle ScholarCross RefCross Ref
  9. Jérémy Barbay, Alejandro López-Ortiz, Tyler Lu, and Alejandro Salinger. 2010. An experimental investigation of set intersection algorithms for text searching. ACM J. Exp. Algorithmics, 14 (2010), Jan., issn:1084-6654 https://doi.org/10.1145/1498698.1564507 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman. 2012. Julia: A Fast Dynamic Language for Technical Computing. arXiv:1209.5145 [cs], Sept..Google ScholarGoogle Scholar
  11. Aart J. C. Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. arXiv:2202.04305 [cs], Feb..Google ScholarGoogle Scholar
  12. Aart J. C. Bik and Harry A. G. Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. isbn:978-0-89791-600-4 https://doi.org/10.1145/165939.166023 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Aart J. C. Bik and Harry A. G. Wijshoff. 1994. Nonzero structure analysis. In Proceedings of the 8th international conference on Supercomputing. isbn:978-0-89791-665-3 https://doi.org/10.1145/181181.181538 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aart J. C. Bik and Harry A. G. Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. isbn:978-3-540-48308-3 https://doi.org/10.1007/3-540-57659-2_4 Google ScholarGoogle ScholarCross RefCross Ref
  15. Aydin Buluc and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In 2008 IEEE International Symposium on Parallel and Distributed Processing. https://doi.org/10.1109/IPDPS.2008.4536313 Google ScholarGoogle ScholarCross RefCross Ref
  16. Ashok K. Chandra and Philip M. Merlin. 1977. Optimal implementation of conjunctive queries in relational data bases. In Proceedings of the ninth annual ACM symposium on Theory of computing. isbn:978-1-4503-7409-5 https://doi.org/10.1145/800105.803397 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). isbn:978-1-939133-08-3Google ScholarGoogle Scholar
  18. Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2017. Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. isbn:978-1-4503-5114-0 https://doi.org/10.1145/3126908.3126936 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lam Chi-Chung, P. Sadayappan, and Rephael Wenger. 1997. On Optimizing a Class of Multi-Dimensional Loops with Reduction for Parallel Execution. Parallel Process. Lett., 07, 02 (1997), June, issn:0129-6264 https://doi.org/10.1142/S0129626497000176 Google ScholarGoogle ScholarCross RefCross Ref
  20. Jee W. Choi, Amik Singh, and Richard W. Vuduc. 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. SIGPLAN Not., 45, 5 (2010), Jan., issn:0362-1340 https://doi.org/10.1145/1837853.1693471 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Oct., https://doi.org/10.1145/3276493 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2020. Automatic generation of efficient sparse tensor format conversion routines. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. isbn:978-1-4503-7613-6 https://doi.org/10.1145/3385412.3385963 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. 2001. Automated empirical optimizations of software and the ATLAS project. Parallel Comput., 27, 1 (2001), Jan., issn:0167-8191 https://doi.org/10.1016/S0167-8191(00)00087-9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Frigo and S.G. Johnson. 1998. FFTW: an adaptive software architecture for the FFT. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181). 3, https://doi.org/10.1109/ICASSP.1998.681704 Google ScholarGoogle ScholarCross RefCross Ref
  25. Shashi Gowda, Yingbo Ma, Alessandro Cheli, Maja Gwozdz, Viral B. Shah, Alan Edelman, and Christopher Rackauckas. 2021. High-performance symbolic-numerics via multiple dispatch. arXiv:2105.03949 [cs], May.Google ScholarGoogle Scholar
  26. Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Comput. Surv., 25, 2 (1993), June, issn:0360-0300 https://doi.org/10.1145/152610.152611 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Johnnie Gray and Stefanos Kourtis. 2021. Hyper-optimized tensor network contraction. Quantum, 5 (2021), March, https://doi.org/10.22331/q-2021-03-15-410 Google ScholarGoogle ScholarCross RefCross Ref
  28. Tobias Grosser, Armin Groesslinger, and Christian Lengauer. 2012. Polly — performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett., 22, 04 (2012), Dec., issn:0129-6264 https://doi.org/10.1142/S0129626412500107 Google ScholarGoogle ScholarCross RefCross Ref
  29. Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw., 4, 3 (1978), Sept., issn:0098-3500 https://doi.org/10.1145/355791.355796 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Albert Hartono, Qingda Lu, Thomas Henretty, Sriram Krishnamoorthy, Huaijian Zhang, Gerald Baumgartner, David E. Bernholdt, Marcel Nooijen, Russell Pitzer, J. Ramanujam, and P. Sadayappan. 2009. Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry. J. Phys. Chem. A, 113, 45 (2009), Nov., issn:1089-5639 https://doi.org/10.1021/jp9051215 Google ScholarGoogle ScholarCross RefCross Ref
  31. Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of sparse array programming models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Oct., https://doi.org/10.1145/3485505 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hwansoo Han and Chau-Wen Tseng. 2006. Exploiting locality for irregular scientific codes. IEEE Transactions on Parallel and Distributed Systems, 17, 7 (2006), July, issn:1558-2183 https://doi.org/10.1109/TPDS.2006.88 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fredrik Kjolstad, Willow Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). https://doi.org/10.1109/CGO.2019.8661185 Google ScholarGoogle ScholarCross RefCross Ref
  34. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Oct., issn:2475-1421 https://doi.org/10.1145/3133901 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Phokion G. Kolaitis and Moshe Y. Vardi. 2000. Conjunctive-Query Containment and Constraint Satisfaction. J. Comput. System Sci., 61, 2 (2000), Oct., issn:0022-0000 https://doi.org/10.1006/jcss.2000.1713 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. George Konstantinidis and Jose Luis Ambite. 2013. Scalable containment for unions of conjunctive queries under constraints. In Proceedings of the Fifth Workshop on Semantic Web Information Management - SWIM ’13. isbn:978-1-4503-2194-5 https://doi.org/10.1145/2484712.2484716 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Cornell.Google ScholarGoogle Scholar
  38. Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. Compiling parallel sparse code for user-defined data structures. Cornell.Google ScholarGoogle Scholar
  39. Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. isbn:978-3-540-69549-3 https://doi.org/10.1007/BFb0002751 Google ScholarGoogle ScholarCross RefCross Ref
  40. Dimitrios Koutsoukos, Supun Nakandala, Konstantinos Karanasos, Karla Saur, Gustavo Alonso, and Matteo Interlandi. 2021. Tensors: an abstraction for general data processing. Proc. VLDB Endow., 14, 10 (2021), June, issn:2150-8097 https://doi.org/10.14778/3467861.3467869 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. isbn:978-1-4503-3559-1 https://doi.org/10.1145/2751205.2751209 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Shangyu Luo, Dimitrije Jankov, Binhang Yuan, and Chris Jermaine. 2021. Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra. In Proceedings of the 2021 International Conference on Management of Data. isbn:978-1-4503-8343-1 https://doi.org/10.1145/3448016.3457317 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. John Michael McNamee. 1971. Algorithm 408: a sparse matrix package (part I) [F4]. Commun. ACM, 14, 4 (1971), April, issn:0001-0782 https://doi.org/10.1145/362575.362584 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C. Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout. 2019. Sparse computation data dependence simplification for efficient compiler-generated inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314646 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Trans. Graph., 35, 4 (2016), July, issn:0730-0301 https://doi.org/10.1145/2897824.2925952 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Luigi Nardi, Artur Souza, David Koeplinger, and Kunle Olukotun. 2019. HyperMapper: a Practical Design Space Exploration Framework. In 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). https://doi.org/10.1109/MASCOTS.2019.00053 Google ScholarGoogle ScholarCross RefCross Ref
  47. Israt Nisa, Charles Siegel, Aravind Sukumaran Rajam, Abhinav Vishnu, and P. Sadayappan. 2018. Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). https://doi.org/10.1109/IPDPSW.2018.00164 Google ScholarGoogle ScholarCross RefCross Ref
  48. William Pugh and Tatiana Shpeisman. 1999. SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations. In Languages and Compilers for Parallel Computing. isbn:978-3-540-48319-9 https://doi.org/10.1007/3-540-48319-5_14 Google ScholarGoogle ScholarCross RefCross Ref
  49. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. isbn:978-1-4503-2014-6 https://doi.org/10.1145/2491956.2462176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ari Rasch, Michael Haidl, and Sergei Gorlatch. 2017. ATF: A Generic Auto-Tuning Framework. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.9 Google ScholarGoogle ScholarCross RefCross Ref
  51. Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A sparse iteration space transformation framework for sparse tensor algebra. Proc. ACM Program. Lang., 4, OOPSLA (2020), Nov., https://doi.org/10.1145/3428226 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE, 106, 11 (2018), Nov., issn:1558-2256 https://doi.org/10.1109/JPROC.2018.2857721 Google ScholarGoogle ScholarCross RefCross Ref
  53. Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the Sparse Polyhedral Framework. Parallel Comput., 53 (2016), April, issn:0167-8191 https://doi.org/10.1016/j.parco.2016.02.004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR. arXiv:2102.05187 [cs], Feb..Google ScholarGoogle Scholar
  55. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arXiv:1802.04730 [cs], June.Google ScholarGoogle Scholar
  56. Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and data transformations for sparse matrix code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2737924.2738003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, and Mary Hall. 2016. Automating Wavefront Parallelization for Sparse Matrix Computations. In SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/SC.2016.40 Google ScholarGoogle ScholarCross RefCross Ref
  58. Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C. J. Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, and Paul van Mulbregt. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods, 17, 3 (2020), March, issn:1548-7105 https://doi.org/10.1038/s41592-019-0686-2 Google ScholarGoogle ScholarCross RefCross Ref
  59. Richard W. Vuduc. 2004. Automatic performance tuning of sparse matrix kernels. Ph. D. Dissertation. University of California.Google ScholarGoogle Scholar
  60. Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, and Dan Suciu. 2020. SPORES: sum-product optimization via relational equality saturation for large scale linear algebra. Proc. VLDB Endow., 13, 12 (2020), Aug., issn:2150-8097 https://doi.org/10.14778/3407790.3407799 Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ziheng Wang. 2020. Automatic optimization of sparse tensor algebra programs. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  62. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proceedings of the 2007 ACM/IEEE conference on Supercomputing. isbn:978-1-59593-764-3 https://doi.org/10.1145/1362622.1362674 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yongyang Yu, Mingjie Tang, and Walid G. Aref. 2021. Scalable Relational Query Processing on Big Matrix Data. arXiv:2110.01767 [cs], Oct..Google ScholarGoogle Scholar
  64. Binhang Yuan, Dimitrije Jankov, Jia Zou, Yuxin Tang, Daniel Bourgeois, and Chris Jermaine. 2021. Tensor relational algebra for distributed machine learning system design. Proc. VLDB Endow., 14, 8 (2021), April, issn:2150-8097 https://doi.org/10.14778/3457390.3457399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yue Zhao, Jiajia Li, Chunhua Liao, and Xipeng Shen. 2018. Bridging the gap between deep learning and sparse matrix format selection. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. isbn:978-1-4503-4982-6 https://doi.org/10.1145/3178487.3178495 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Autoscheduling for sparse tensor algebra with an asymptotic cost model
                Index terms have been assigned to the content through auto-classification.

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader